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Background / Context: 

Description of prior research and its intellectual context. 


The growing availability of data linking students to classroom teachers has made it possible to 
estimate the contribution teachers make to student achievement. By nearly all accounts, this 
contribution is large. Estimates of the impact of a one standard deviation (s.d.) increase in 
teacher “value-added” range from 0.10 to 0.30 s.d. in both math and reading. These effects have 
been documented in locations as diverse as Texas (Rivkin, Hanushek, & Kain, 2005), North 
Carolina (Clotfelter et al., 2006; Goldhaber & Hansen, 2012; Rothstein, 2010), Chicago 
(Aaronson, Barrow, & Sander, 2007), Florida (Harris & Sass, 2006; McCaffrey et al., 2009), 
New Jersey (Rockoff, 2004), San Diego (Koedel & Betts, 2009), Los Angeles (Buddin & 
Zamarro, 2008; Kane & Staiger, 2008), and elsewhere (Jacob, Lefgren, & Sims, 2010; Papay, 
2011). The magnitude of these effects suggests a student assigned to an effective teacher will 
experience nearly a full year's more growth than a student assigned to an ineffective teacher. 

Galvanized by these findings, policymakers at the state and school district level have moved to 
incorporate value-added measures into teacher evaluation systems. President Barack Obama and 
U.S. Secretary of Education Arne Duncan have embraced this approach and pressured states to 
use value-added measures as significant criteria in the promotion, compensation, and dismissal 
of educators. By some estimates, the potential for such policies to raise teacher quality, student 
learning, and economic growth is substantial (e.g., Chetty et al., 2011; Hanushek, 2009). 

While there is a growing consensus that teacher quality is important and current evaluation 
systems are inadequate, many have expressed concerns over the use of value-added measures 
(VAMs) in high-stakes personnel decisions. These concerns are often grounded in VAMs’ 
statistical imprecision and possible susceptibility to bias (Briggs & Dominique, 2011; Corcoran 
& Goldhaber, 2013; Harris, 2011). Because teachers are not randomly assigned to students, 
VAMs are plausibly biased by the presence of other unmeasured student, class, or school 
influences on achievement (Rothstein, 2010). Moreover, VAMs are imprecise, with a substantial 
proportion of the variation in achievement across classrooms attributable to student and 
classroom-level noise (McCaffrey et al., 2009; Schochet & Chiang, 2013). While research on 
these issues is ongoing, the prevailing view appears to hold that these limitations are not 
significant barriers to the use of VAMs in evaluating teacher performance, if done with 
appropriate caution (Glazerman et al., 2010, 2011). 

Purpose / Objective / Research Question / Focus of Study: 

Description of the focus of the research. 


We conduct a new test of the validity of teacher value-added models. We apply traditionally- 
estimated VAM models to an outcome that teachers cannot plausibly have a causal effect on: 
student height. Any estimated “effect” of teachers on height should raise questions about the 
extent to which VAMs cleanly distinguish between effective and ineffective teachers. We also 
examine two potential interpretations for effects of teachers on height. The first is that these 
effects reflect bias, sorting to teachers on the basis of unobserved factors related to height (that 
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may or may not be related to achievement). The second is that these effects reflect measurement 
error or other forms of random “noise.” Both have implications for the use of VAMs in practice. 


Setting: 

Description of the research location. 

All students in grades 4 and 5 in New York City public schools between 2007 and 2010, and 
kindergarten students in the Early Childhood Longitudinal Survey - Kindergarten Cohort 
(ECLS-K), a national study of more than 20,000 children in the kindergarten class of 1998. 

Population / Participants / Subjects: 

Description of the participants in the study: who, how many, key features, or characteristics. 


We use two data sources for estimating teacher effects on achievement and height. The first is a 
panel of all students in grades 4-5 in New York City public schools between 2007 and 2010 
(approximately 473,000 student-year observations). Each student is linked to their English 
Language Arts (ELA) and mathematics teacher, and to annual “Fitnessgram” measurements of 
their height. The second is the Early Childhood Longitudinal Survey - Kindergarten Cohort 
(ECLS-K). ECLS-K students are linked to classroom teachers, and were tested in reading and 
math at the beginning and end of the school year. Trained assessors additionally measured 
participants' height in both the fall and spring of the kindergarten year. We use approximately 
9,200 students from the ECLS-K who shared a classroom teacher with at least 4 other students. 

These two data sources contribute to our analysis in different ways. The NYC data represents a 
large population of students and teachers over four years. The number of students observed per 
teacher is relatively large, allowing for precise estimates of teacher effects and estimation across 
multiple years of data. The ECLS-K sample is smaller, but height measurements in this sample 
are less likely to be measured with error. Another advantage of the ECLS-K is that there will be 
less sorting within schools to teachers on the basis of unobserved factors related to achievement 
(or height). To the extent non-random sorting of students is a problem in typical value-added 
models (e.g., Rothstein, 2010), this should be less of a concern in kindergarten (and thus in the 
ECLS-K), where teachers should know less about incoming students. The ECLS-K achievement 
scores are continuous, normally distributed, and devoid of ceiling effects, while NYC scores 
have a lumpier distribution. The ECLS-K provides us a national sample. Finally, the ECLS-K 
conducted extensive teacher and parent interviews, which allow us to control for more family 
background characteristics than are typically available in administrative data sets. 

Intervention / Program / Practice: 

Description of the intervention, program, or practice, including details of administration and duration. 

In the traditional estimation of teacher value-added on student achievement, individual teachers 
are treated as non-experimental “interventions” for which impacts can be estimated. Value-added 
models assume that systematic variation in achievement across teachers can provide an unbiased 
estimate of teachers’ causal effects on test performance, after controlling for prior achievement 
and a limited set of covariates. In our NYC analysis, 4 th and 5 th grade teachers are the 
“intervention” of interest. In the ECLS-K, kindergarten teachers are the intervention of interest. 
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Research Design: 

Description of the research design. 


For each NYC outcome (math and ELA achievement, height) we estimate teacher effects using a 
standard value-added model that controls for the prior year outcome, year indicators, and student 
covariates (gender, race/ethnicity, recent immigrant and LEP status, special education status, 
eligibility for free or reduced-price lunch, and borough). Models for height also include age-by- 
gender interactions and race-by-gender interactions. Teacher effects are alternately estimated 
assuming random or fixed effects. The random effects are the Empirical Bayes, or best linear 
unbiased predictors (BLUPs) of the teacher effects; we adjust the fixed effects to account for 
sampling variation. In practice, the random and fixed effect estimates are very similar. 

To the extent teacher effects are biased due to school-level influences on the outcome of interest 
correlated with teacher assignment (e.g., leadership quality or other resources), we also estimate 
teacher effect models with school effects included. This is done in two steps, first regressing 
current achievement on all covariates and school effects, and then estimating teacher effects from 
the residuals. (We note that while the estimation of value-added models with school effects is 
common in the research literature, it is deliberately not done often in practice). 

For ECLS-K outcomes we estimate a model similar to that for NYC, but using spring and fall 
measures as the current and lagged dependent variables, respectively. Covariates include gender, 
race/ethnicity, and an indicator for whether the child speaks a language other than English at 
home. Teacher effects are again estimated assuming random or fixed effects, and we account for 
the complex sampling design of ECLS-K by using the kindergarten panel weight, and in the 
random effects models, the school weight as well. 

We begin by summarizing the magnitude of the estimated teacher effects on each outcome, 
measured by the s.d. of the random or fixed effect. These are compared across outcomes (e.g., 
height and achievement) and model specifications (e.g., models with school effects and without). 
We then examine how teachers’ effects are correlated across outcomes. A correlation between 
value-added in height and in achievement could be indicative of sorting to teachers on the basis 
of unobserved factors related to each. We compute within-teacher correlations in effects across 
years as a measure of the persistent component of effects across years. Finally, as an alternative 
test for the role of noise, we re-estimate each model after randomly allocating observed student 
data across teachers and schools. This approach eliminates all possibilities of non-random sorting 
of students to teachers and thus remaining “effects” should be pure noise. 

Data Collection and Analysis: 

Description of the methods for collecting and analyzing data. 


All NYC data comes from administrative databases provided by the New York City Department 
of Education. Restricted use ECLS-K data was obtained from the National Center for Education 
Statistics under a licensing agreement that ensures data anonymity and security. Our procedure 
for analyzing the data was summarized in the previous section. 

Findings / Results: 

Description of the main findings with specific details. 
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In both data sources we find the magnitude of teacher effects on height is nearly as large as their 
effect on math and reading achievement. For instance, we find a one s.d. increase in teacher 
“value-added” to the height of New York City 4th graders is associated with 0.23 s.d. taller 
students. This effect size can be compared to a 0.28 and 0.25 s.d. impact on achievement in math 
and ELA, respectively. Models that include school effects reduce the magnitude of these 
estimates, although they remain large at 0.16-0.17 s.d. We find very similar effects of teachers on 
height in the ECLS-K. We find no direct evidence of bias in achievement VAMs, given we find 
a correlation between teachers’ value-added in height and achievement that is zero. By the same 
token, we observe no correlation in teacher effects on height across years, an indication that there 
is no persistent component to teacher effects on height. Nevertheless, the extent of noise in these 
models appears to be large. Even when randomly assigning student data to teachers and schools, 
we continue to find a significant “effect” of teachers on achievement. 

Conclusions: 

Description of conclusions, recommendations, and limitations based on findings. 


These findings raise important questions about the extent to which VAMs cleanly distinguish 
between effective and ineffective teachers. Our finding of a near-zero correlation between a 
teacher’s value-added on height and her value-added on achievement offers some comfort that 
VAMs for achievement are not biased by the presence of omitted variables associated with 
height. Moreover, our finding of a near-zero correlation in a teacher’s effect on height across 
years suggests that the “signal” contained in these effects is negligible. However, our results do 
show that the “noise” in value-added models is substantial. When fitting traditionally-estimated 
value-added models to an outcome in which teachers cannot plausibly have an impact, we find 
“effects” that are as large as those found for outcomes that teachers do affect. 

Taken together, our results provide a cautionary tale for the interpretation and use of teacher 
VAM estimates in practice. We find that — simply due to chance — teacher effects can appear 
large, even on outcomes they cannot plausibly affect. The implication is that many value-added 
studies likely overstate the extent to which teachers differ in their effectiveness, although further 
research is needed. Furthermore, users of VAMs should take care to ensure their estimates reflect 
the signal component of teacher effectiveness and are not driven by noise. This is especially 
important when personnel and compensation decisions are tied to individual VAM estimates. 
While most contemporary value-added systems do adjust VAMs for noise (applying a 
“shrinkage” factor), the procedure for doing so is not standard, and the quality of these 
adjustments is unclear. 
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